NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Demonstrating Visual Information Manipulation Attacks in Augmented Reality: A Hands-On Miniature City-Based Setup

https://doi.org/10.1145/3704413.3765305

Xiu, Yanming; Gorlatova, Maria (October 2025, ACM)

Augmented reality (AR) enhances user interaction with the real world but also presents vulnerabilities, particularly through Visual Information Manipulation (VIM) attacks. These attacks alter important real-world visual cues, leading to user confusion and misdirected actions. In this demo, we present a hands-on experience using a miniature city setup, where users interact with manipulated AR content via the Meta Quest 3. The demo highlights the impact of VIM attacks on user decision-making and underscores the need for effective security measures in AR systems. Future work includes a user study and cross-platform testing.
more » « less
Free, publicly-accessible full text available October 23, 2026
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

Duan, Lin; Xiu, Yanming; Gorlatova, Maria (March 2025, Applied Electronics)

Augmented Reality (AR) enhances the real world by integrating virtual content, yet ensuring the quality, usability, and safety of AR experiences presents significant challenges. Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? Could Vision-Language Models (VLMs) offer a solution for the automated evaluation of AR-generated scenes? In this study, we evaluate the capabilities of three state-of-the-art commercial VLMs -- GPT, Gemini, and Claude -- in identifying and describing AR scenes. For this purpose, we use DiverseAR, the first AR dataset specifically designed to assess VLMs' ability to analyze virtual content across a wide range of AR scene complexities. Our findings demonstrate that VLMs are generally capable of perceiving and describing AR scenes, achieving a True Positive Rate (TPR) of up to 93% for perception and 71% for description. While they excel at identifying obvious virtual objects, such as a glowing apple, they struggle when faced with seamlessly integrated content, such as a virtual pot with realistic shadows. Our results highlight both the strengths and the limitations of VLMs in understanding AR scenarios. We identify key factors affecting VLM performance, including virtual content placement, rendering quality, and physical plausibility. This study underscores the potential of VLMs as tools for evaluating the quality of AR experiences.
more » « less
Free, publicly-accessible full text available March 10, 2026
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

Xiu, Yanming; Scargill, Timothy; Gorlatova, Maria (March 2025, IEEE transactions on visualization and computer graphics)

In Augmented Reality (AR), virtual content enhances user experience by providing additional information. However, improperly positioned or designed virtual content can be detrimental to task performance, as it can impair users' ability to accurately interpret real-world information. In this paper we examine two types of task-detrimental virtual content: obstruction attacks, in which virtual content prevents users from seeing real-world objects, and information manipulation attacks, in which virtual content interferes with users' ability to accurately interpret real-world information. We provide a mathematical framework to characterize these attacks and create a custom open-source dataset for attack evaluation. To address these attacks, we introduce ViDDAR (Vision language model-based Task-Detrimental content Detector for Augmented Reality), a comprehensive full-reference system that leverages Vision Language Models (VLMs) and advanced deep learning techniques to monitor and evaluate virtual content in AR environments, employing a user-edge-cloud architecture to balance performance with low latency. To the best of our knowledge, ViDDAR is the first system to employ VLMs for detecting task-detrimental content in AR settings. Our evaluation results demonstrate that ViDDAR effectively understands complex scenes and detects task-detrimental content, achieving up to 92.15% obstruction detection accuracy with a detection latency of 533 ms, and an 82.46% information manipulation content detection accuracy with a latency of 9.62 s.
more » « less
Free, publicly-accessible full text available March 10, 2026
ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality

https://doi.org/10.1109/TVCG.2025.3549147

Xiu, Yanming; Scargill, Tim; Gorlatova, Maria (May 2025, IEEE Transactions on Visualization and Computer Graphics)

Free, publicly-accessible full text available May 1, 2026
Apple v.s. Meta: A Comparative Study on Spatial Tracking in SOTA XR Headsets

https://doi.org/10.1145/3636534.3696215

Hu, Tianyi; Yang, Fan; Scargill, Tim; Gorlatova, Maria (December 2024, ACM)

Inaccurate spatial tracking in extended reality (XR) headsets can cause virtual object jitter, misalignment, and user discomfort, limiting the headsets’ potential for immersive content and natural interactions. We develop a modular testbed to evaluate the tracking performance of commercial XR headsets, incorporating system calibration, tracking data acquisition, and result analysis, and allowing the integration of external cameras and IMU sensors for comparison with opensource VI-SLAM algorithms. Using this testbed, we quantitatively assessed spatial tracking accuracy under various user movements and environmental conditions for the latest XR headsets, Apple Vision Pro and Meta Quest 3. The Apple Vision Pro outperformed the Meta Quest 3, reducing relative pose error (RPE) and absolute pose error (APE) by 33.9% and 14.6%, respectively. While both headsets achieved sub-centimeter RPE in most cases, they exhibited APE exceeding 10 cm in challenging scenarios, highlighting the need for further improvements in reliability and accuracy.
more » « less
Full Text Available
Reimagining Mutual Information for Defense against Data Leakage in Collaborative Inference

Duan, Lin; Sun, Jingwei; Jia, Jinyuan; Chen, Yiran; Gorlatova, Maria (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

Full Text Available
Vision Language Model-Based Solution for Obstruction Attack in AR: A Meta Quest 3 Implementation

https://doi.org/10.1109/VRW66409.2025.00464

Xiu, Yanming; Gorlatova, Maria (March 2025, IEEE VR 2025)

Obstruction attacks in Augmented Reality (AR) pose significant challenges by obscuring critical real-world objects. This work demonstrates the first implementation of obstruction detection on a video see-through head-mounted display (HMD), the Meta Quest 3. Leveraging a vision language models (VLM) and a multi-modal object detection model, our system detects obstructions by analyzing both raw and augmented images. Due to limited access to raw camera feeds, the system employs an image-capturing approach using Oculus casting, capturing a sequence of images and finding the raw image from them. Our implementation showcases the feasibility of effective obstruction detection in AR environments and highlights future opportunities for improving real-time detection through enhanced camera access.
more » « less
Free, publicly-accessible full text available March 8, 2026
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

https://doi.org/10.1109/VRW66409.2025.00039

Duan, Lin; Xiu, Yanming; Gorlatova, Maria (March 2025, IEEE)

Free, publicly-accessible full text available March 8, 2026
Environment Texture Optimization for Augmented Reality

https://doi.org/10.1145/3678510

Scargill, Tim; Janamsetty, Ritvik; Fronk, Christian; Eom, Sangjun; Gorlatova, Maria (August 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Augmented reality (AR) platforms now support persistent, markerless experiences, in which virtual content appears in the same place relative to the real world, across multiple devices and sessions. However, optimizing environments for these experiences remains challenging; virtual content stability is determined by the performance of device pose tracking, which depends on recognizable environment features, but environment texture can impair human perception of virtual content. Low-contrast 'invisible textures' have recently been proposed as a solution, but may result in poor tracking performance when combined with dynamic device motion. Here, we examine the use of invisible textures in detail, starting with the first evaluation in a realistic AR scenario. We then consider scenarios with more dynamic device motion, and conduct extensive game engine-based experiments to develop a method for optimizing invisible textures. For texture optimization in real environments, we introduce MoMAR, the first system to analyze motion data from multiple AR users, which generates guidance using situated visualizations. We show that MoMAR can be deployed while maintaining an average frame rate > 59fps, for five different devices. We demonstrate the use of MoMAR in a realistic case study; our optimized environment texture allowed users to complete a task significantly faster (p=0.003) than a complex texture.
more » « less
Full Text Available
3D Object Detection with VI-SLAM Point Clouds: The Impact of Object and Environment Characteristics on Model Performance

https://doi.org/10.1109/ICRA57147.2024.10610778

Duan, Lin; Scargill, Tim; Chen, Ying; Gorlatova, Maria (May 2024, IEEE)

3D object detection (OD) is a crucial element in scene understanding. However, most existing 3D OD models have been tailored to work with light detection and ranging (LiDAR) and RGB-D point cloud data, leaving their performance on commonly available visual-inertial simultaneous localization and mapping (VI-SLAM) point clouds unexamined. In this paper, we create and release two datasets: VIP500, 4772 VI-SLAM point clouds covering 500 different object and environment configurations, and VIP500-D, an accompanying set of 20 RGB-D point clouds for the object classes and shapes in VIP500. We then use these datasets to quantify the differences between VI-SLAM point clouds and dense RGB-D point clouds, as well as the discrepancies between VI-SLAM point clouds generated with different object and environment characteristics. Finally, we evaluate the performance of three leading OD models on the diverse data in our VIP500 dataset, revealing the promise of OD models trained on VI-SLAM data; we examine the extent to which both object and environment characteristics impact performance, along with the underlying causes.
more » « less
Full Text Available

« Prev Next »

Search for: All records